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Abstract 


We present simple algorithms for achieving self-stabilizing location management and routing in mobile 
ad-hoc networks. While mobile clients may be susceptible to corruption and stopping failures, mobile 
networks are often deployed with a reliable GPS oracle, supplying frequent updates of accurate real time 
and location information to mobile nodes. Information from a GPS oracle provides an external, shared 
source of consistency for mobile nodes, allowing them to label and timestamp messages, and hence aiding 
in identification of, and eventual recovery from, corruption and failures. Our algorithms use a GPS oracle. 

Our algorithms also take advantage of the Virtual Stationary Automata programming abstraction, 
consisting of mobile clients, virtual timed machines called virtual stationary automata (VSAs), and a 
local broadcast service connecting VSAs and mobile clients. VSAs are distributed at known locations 
over the plane, and emulated in a self-stabilizing manner by the mobile nodes in the system. They serve 
as fault-tolerant building blocks that can interact with mobile clients and each other, and can simplify 
implementations of services in mobile networks. 

We implement three self-stabilizing, fault-tolerant services, each built on the prior services: (1) VSA- 
to-VSA geographic routing, (2) mobile client location management, and (3) mobile client end-to-end 
routing. We use a greedy version of the classical depth-first search algorithm to route messages between 
VSAs in different regions. The mobile client location management service is based on home locations: 
Each client identifier hashes to a set of home locations, regions whose VSAs are periodically updated with 
the client’s location. VSAs maintain this information and answer queries for client locations. Finally, the 
VSA-to-VSA routing and location management services are used to implement mobile client end-to-end 
routing. 
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1 = Introduction 


A system with no fixed infrastructure in which mobile clients may wander in the plane and assist each 
other in forwarding messages is called an ad-hoc network. The task of designing algorithms for constantly 
changing networks is difficult. Highly dynamic networks, however, are becoming increasingly prevalent, 
especially in the context of pervasive and ubiquitous computing, and it is therefore important to develop 
and use techniques that simplify this task. In addition, mobile nodes in these networks may suffer from 
crash failures or corruption faults, which cause arbitrary changes to their program states. Self-stabilization 
[4, 5] is the ability to recover from an arbitrarily corrupt state. This property is important in long-lived, 
chaotic systems where certain events can result in unpredictable faults. For example, transient interference 
may disrupt wireless communication, violating our assumptions about the broadcast medium. 


Mobile networks are often deployed in conjunction with “reliable”? GPS services, supplying frequent 
updates of real time and region information to mobile nodes. While the mobile clients may be susceptible to 
corruption and stopping failures, the GPS service may not be. Each of our algorithms utilizes such a reliable 
GPS oracle. Information from this oracle provides an external, shared source of consistency for mobile nodes, 
allowing them to label and timestamp their messages, and hence, aiding in identification of, and recovery 
from, corruption and stopping failures. 


In this paper we describe self-stabilizing algorithms that use a reliable GPS oracle to provide geographic 
routing, a mobile client location management service, and a mobile client end-to-end routing service. Each 
service is built on the prior services such that the composition of the services remains self-stabilizing [11]. 
In order to route location information between geographic regions, we use a greedy version of the classical 
depth-first search algorithm. This service is then used to help implement the location management service; 
each mobile client identifier hashes to a set of home locations, geographical regions that are periodically 
updated with the location of the client, and that are responsible for then answering queries about the 
client’s location. Both of these services are then used to implement point-to-point routing between mobile 
clients in the network. 


In order to simplify the implementations of the location management and routing services, we mask the 
unpredictable behavior of mobile nodes by using a self-stabilizing virtual infrastructure, consisting of mobile 
client automata, timing-aware and location-aware machines at fixed locations, called Virtual Stationary 
Automata (VSAs) [8, 9], that mobile clients can interact with and use to coordinate their actions, and a local 
broadcast service connecting VSAs and mobile clients. 


Self-stabilization and GPS oracles. Traditionally, studies of self-stabilizing systems are concerned with 
those systems that can be started from arbitrary configurations and eventually regain consistency without 
external help. However, mobile clients often have access to some reliable external information from a service 
such as GPS. Each of our algorithms in this paper uses an external GPS service (or an equivalent service) 
as areliable GPS oracle, providing periodic real time clock and location updates, to base stabilization upon; 
our algorithms use timestamps and location information to tag events. In an arbitrary state, recorded 
events may have corrupted timestamps. Corrupted timestamps indicating future times can be identified 
and reset to predefined values; new events receive newer timestamps than any in the arbitrary initial state. 
This eventually allows nodes in the system to totally order events. We use the eventual total order to 
provide consistency of information and distinguish between incarnations of activity (such as retransmissions 
of messages). 
Virtual Stationary Automata programming layer. In prior work [8, 7, 6], we developed a notion of 
“virtual nodes” for mobile ad hoc networks. A virtual node is an abstract, relatively well-behaved active node 
that is implemented using less well-behaved real physical nodes. The GeoQuorums algorithm [7] proposes 
storing data at fixed locations; however it only supports atomic objects, rather than general automata. 
A more general virtual mobile automaton is suggested in [6]. Finally, the virtual automata presented in 
[8, 9] (and used here) are more powerful than those of [6], providing timing capabilities needed for many 
applications. These automata are stationary and arranged in a connected pattern similar to that of a 
traditional wired network. 

The static infrastructure we use in this paper includes fixed, timed virtual machines with an explicit notion 
of real time, called Virtual Stationary Automata (VSAs), distributed at known locations over the plane [8, 9]. 
Each VSA represents a predetermined geographic area and has broadcast capabilities similar to those of the 


mobile physical nodes, allowing nearby VSAs and mobile nodes to communicate with one another. Many 
algorithms depend significantly on timing, and it is reasonable to assume that many mobile nodes have access 
to reasonably synchronized clocks. In the VSA layer, VSAs also have access to virtual clocks, guaranteed 
to not drift too far from real time. The layer provides mobile nodes with a fixed virtual infrastructure, 
reminiscent of more traditional and better understood wired networks, with which to coordinate their actions. 


Our clock-enabled VSA layer is emulated by physical mobile nodes in the network. Each physical node is 
periodically informed its region by the GPS. A VSA for a particular region is then emulated by a subset of 
the mobile nodes in its region: the VSA state is maintained in the memory of the physical nodes emulating 
it, and the physical nodes perform VSA actions on behalf of the VSA. If no physical nodes are in the region, 
the VSA fails; if physical nodes later arrive, the VSA restarts. 

An important property of the VSA layer implementation described in [8, 9] is that it is self-stabilizing. 
Corruption failures at physical nodes can result in inconsistency in the emulation of a VSA. Our implemen- 
tation, however, can recover after corruptions to correctly emulate a VSA. To algorithms run on the VSA 
layer, the VSA simply appears to suffer from a corruption. 

Geographic/ VSA-to-VSA routing. A basic service running on the VSA layer that we describe and 
use repeatedly is that of VSA-to-VSA (region-to-region) routing (VtoVComm), providing a form of geocast. 
GeoCast algorithms [24, 3], GOAFR [19], and algorithms for “routing on a curve” [23] route messages 
based on the location of the source and destination, using geography to delivery messages efficiently. GPSR 
[17], AFR [20], GOAFR+ [19], polygonal broadcast [10], and the asymptotically optimal algorithm [20] 
are algorithms based on greedy geographic routing algorithms, forwarding messages to the neighbor that is 
geographically closest to the destination. The algorithms also address “local minimum situations”, where the 
greedy decision cannot be made. GPSR, GOAFR+, and AFR achieve, under reasonable network behavior, a 
linear order expected cost in the distance between the sender and the receiver. We implement VSA-to-VSA 
routing using a persistent greedy depth-first search (DFS) routing algorithm that runs on top of the VSA 
layer’s fixed infrastructure. Our scheme is an application of the classical DFS algorithm in a new setting. 
Location management. Finding the location of a moving client in an ad-hoc network is difficult, much 
more so than in cellular mobile networks where a fixed infrastructure of wired support stations exist (as in 
[16]), or in sensor networks where some approximation of a fixed infrastructure may exist [2]. A location 
service in ad-hoc networks is a service that allows any client to discover the location of any other client 
using only its identifier. The basic paradigm for location services that we use here is that of a home location 
service: Hosts called home location servers are responsible for storing and maintaining the location of other 
hosts in the network [1, 14, 21]. Several ways to determine the sets of home location servers, both in the 
cellular and entirely ad-hoc settings, have been suggested. 

The locality aware location service (LLS) in [1] for ad-hoc networks is based on a hierarchy of lattice 
points for destination nodes, published with locations of associated nodes. Lattice points can be queried 
for the desired location, with a query traversing a spiral path of lattice nodes increasingly distant from the 
source until it reaches the destination. Another way of choosing location servers is based on quorums. A set 
of hosts is chosen to be a write quorum for a mobile client and is updated with the client’s location. Another 
set is chosen to be a read quorum and queried for the desired client location. Each write and read quorum 
has a nonempty intersection, guaranteeing that if a read quorum is queried, the results will include the latest 
location of the client written to a write quorum. In [14], a uniform quorum system is suggested, based on a 
virtual backbone of quorum representatives. Geographic quorums based on the focal points abstraction are 
suggested in [7]. 

Location servers can also be chosen using a hash table. Some papers [21, 15, 25] use geographic locations 
as a repository for data. These use a hash to associate each piece of data with a region of the network and 
store the data at certain nodes in the region. This data can then be used for routing or other applications. 
The Grid location service (GLS) [21] maps client ids to geographic coordinates. A client C,’s location is 
saved by clients closest to the coordinates p hashes to. 

The location managment scheme we present here is based on the hash table concept and built on top of 
the VSA layer and VSA-to-VSA routing service. VSAs and mobile clients are programmed to form a self- 
stabilizing, fault-tolerant distributed data structure for location management, where VSAs serve as home 
locations for mobile clients. Each client’s id hashes to a VSA region, the client’s home location, whose VSA 
is responsible for maintaining the location of the client. Whenever a client node C, would like to locate 


System constants: €sample, the GPS sample period. 
R, a fixed closed connected region of the 2-D plane. d, the broadcast message delay. 
U, a finite set of ids for subregions of R. e, the delay factor for VSA outputs. 
m, the size of U. ttlytov > d, the VtoVComm message delay. 
region, a mapping from U to connected subsets of R. tv SAcor, the VSA stabilization time. 
nbrs, a symmetric relation between ids in U. 
Tvirt, the supremum distance between points in u System variables: 
and v for any regions u,v where u € nbrs(v). now € R, a clock variable, representing real time. 
P, a finite set of client node ids where POU = 9. loc, a continuously updated array of locations in R 
Umax, the maximum client node speed. of mobile nodes, indexed by node id. 


Figure 1: System constants and variables. 


another client node C,, Cp, would compute the home location of Cy by applying a predefined global hash 
function to Cy’s id, and query the region represented by the result of that hash for Cy’s location. In order 
for our scheme to tolerate crash failures of a limited number of VSAs, each mobile client id actually maps 
to a set of VSA home locations; the hash function returns a sequence of region ids as the home locations. 
We can use any hash function that provides a sequence of region identifiers; one possibility is a permutation 
hash function, where permutations of region ids are lexicographically ordered and indexed by client id. 


End-to-end routing. Another basic, but difficult to provide, service in mobile networks is end-to-end 
routing. Our self-stabilizing implementation of a mobile client end-to-end communication service is simple, 
given VSA-to-VSA routing and the home location service. A client sends a message to another client by 
using the home location service to discover the destination client’s region and then has a local VSA forward 
the message to the region using the VSA-to-VSA service. 


Paper organization. ‘The rest of the paper is organized as follows: The system model and the virtual 
automata layer are described in the next section. In Section 3 we describe the problem specifications we 
are interested in. Section 4 describes the VSA-to-VSA communication implementation. In Section 5 we 
descibe the implementation of the home location service. In Section 6 we present the implementation of the 
end-to-end routing service. Concluding remarks appear in Section 7. 


2 Datatypes and system model 


The system consists of a 2-D bounded region plane, where broadcast-enabled, GPS-updated mobile client 
nodes are deployed. We assume the Virtual Stationary Automata programming abstraction [8], which in- 
cludes both the mobile client nodes and virtual stationary automata (VSAs) the real nodes emulate, as well 
as a local broadcast service, V-bcast, between them (see Figure 2). In this section we formally describe the 
system, including: (1) the network tiling, (2) the model for the GPS-augmented mobile clients deployed in 
the network, (3) the model for the virtual nodes deployed in the network, and (4) the specification for the 
local broadcast service in the network. A summary table of datatypes, constants, and variables is in Figure 
1. 


2.1 Network tiling 


The deployment space of the network is assumed to be a fixed, closed, and bounded connected region of 
the 2-D plane called R. R is partitioned into known connected subregions called regions, with unique ids 
drawn from the set of region identifiers U. In practice it may be convenient to restrict regions to be regular 
polygons such as squares or hexagons. We define a neighbor relation nbrs on ids from U. This relation holds 
for any two region identifiers u and v where the supremum distance between points in u and v is bounded 
by a constant ryirt- 


2.2 Client nodes 


For each p in the set of physical node identifiers P, we assume a mobile timed I/O automaton client C5, 
whose location in R at any time is referred to as loc(p). Mobile client speed is bounded by a constant Umax. 
Clients receive region and time information from the GPS oracle. A GPSupdate(u,now), happens every 
Esample time at each client Cp, indicating to the client the region u where it is currently located and the 


current time now. Clients accept this now real-time clock variable as the value of their own local clock. For 
simplicity, this local variable progresses at the rate of real time. This implies that, outside of failures, the 
local value of now will equal real time. 

Each client C, is equipped with a local broadcast service V-bcast (see Section 2.4), allowing it to com- 
municate with its and neighboring regions’ VSAs and clients with bcast(m), and brcv(m)p. 

Clients are susceptible to stopping and corruption failures. After a stopping failure, a client performs no 
additional local steps until restarted. If restarted, it starts again from an initial state. If a node suffers from 
a corruption, it experiences a nondeterministic change to its program state. 

Additional arbitrary external interface actions and local state used by algorithms running at the client 
are allowed. For simplicity local steps are assumed to take no time. 


2.3 Virtual Stationary Automata (VSAs) 
Here we describe VSAs; a self-stabilizing implementation of such machines using a GPS oracle and the 
physical mobile nodes in the system can be found in [8, 9]. An abstract VSA is a timing-enabled virtual 
machine that may be emulated by the physical mobile nodes in its region in the network. We formally 
describe a timed machine for region u, V,,, as a TIOA whose program is a tuple of its action signature, sigy, 
valid states, states,, a start state function mapping clock values to start states, start, a discrete transition 
function, 6,, and a set of valid trajectories, 7. Trajectories [18] describe state evolution over intervals of 
time. The state of V,, is referred to collectively as ustate and is assumed to include a variable corresponding 
to real time, uvstate.now. 

To guarantee that we can emulate a VSA using physical mobile nodes, its interface must be emulatable 
by the nodes. Hence, a VSA YV,,’s external interface is restricted to be similar, including only stopping failure, 
corruption, and restart inputs, and the ability to broadcast and receive messages. Corruption failures result 
in a nondeterministic change to vstate. 

Since a VSA is emulated by physical nodes (cor- 
responding to clients) in its region, its failures are 
defined in terms of client failures in its region: (1) If 
no clients are in the region, the VSA is crashed, (2) If 
no failures of clients (corruption or stopping) occurs 
in an alive VSA’s region over some interval, the VSA east(mm)e 
does not suffer a failure during that interval, and (3) oC eee 
A VSA may suffer a corruption only if a mobile client 
in its region suffers a corruption; the self-stabilizing 
implementation of a VSA in [8, 9] guarantees that 
within tysAcor Of an arbitrary configuration of the 
emulation, the emulation’s external trace will look 
like that of the abstract VSA, starting from a cor- 
rupted abstract state. 
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While an emulation of V,, would ideally be iden- 
tical to a legitimate execution of V,,, an abstraction 
must reflect that, due to message delays or node fail- 
ure, the emulation might be behind real time, ap- 
pearing to be delayed in performing outputs by up 
to some time e. The emulation is then a delay- 
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augmented TIOA, an augmentation of V,, with tim- 
ing perturbations, represented with buffers Dout|[e],,, 
composed with V,,’s outputs. The buffer delays mes- 


Figure 2: Virtual Stationary Automata layer. VSAs 
and clients communicate locally using V-bcast. VSA 
outputs may be delayed in Dout. 


sages by a nondeterministic time [0,e], where e is 
more than V-bcast’s broadcast delay, d (see Section 2.4). Programs must take into account e, as they do d. 


2.4 Local broadcast service (V-bcast) 
Communication is in the form of local broadcast service V-bcast, with broadcast radius r,;-, and message 


delay d. It allows communication between VSAs and clients in the same or neighboring regions. The service 
allows the broadcasting and receiving of message m at each port i € PUU through bcast(m); and brev(m);. 


We assume that V-bcast guarantees two properties between VSAs and between VSAs and clients: in- 
tegrity and reliable local delivery. Integrity guarantees that for any brcv(m); that occurs, a bcast(m),;,7 € 
PUU previously occurred. Reliable local delivery roughly guarantees that a transmission will be received by 
nearby ports: If port i, where 7 is a client or VSA port in any region u, transmits a message, then every port 
j, whether a client or VSA port, in region u or neighboring regions during the entire time interval starting 
at transmission and ending d later receives the message by the end of the interval. (For this definition, due 
to GPSupdate lag, a client is still said to be “in” region u even if it has just left region u but has not yet 
received a GPSupdate with the change.) 


In practice, a broadcast service has bounded buffers. We assume buffers are large enough that overflows 
do not occur in normal operation. In the event of overflow, overflow messages are lost. 


3 Problem specifications 


We describe the services we will build over the VSA layer: VSA-to-VSA routing, a location service, and 
client-to-client routing, and describe our requirement that implementations be self-stabilizing. 


The following constants (explained/used shortly) are globally known: (1) f < m, a limit on “home 
location” VSA failures for a client, (2) h, a function mapping each client id to a sequence of f + 1 distinct 
region ids, (3) ttlviov > d, delivery time for the VtoVComm service, (4) ttlans > €sample +2d+3e+ 2ttlyiov, 
response time of the location management service, and (5) ttl,», a refresh period. We assume the following 
client mobility and VSA crash failure conditions: 

(1) Each client spends at least €sample time in a region before moving to another region, 

(2) At any time, each alive client’s current region or a neighboring region has a non-crashed VSA that 
remains alive for an additional ttlyzgs time, 

(3) For any interval of length ttlyiov + e, two VSAs alive over the interval are connected via at least one 
path of non-crashed VSAs over the entire interval, and 

(4) For any interval of length ttl,» + 2ttlvioy + 2e + d, and any alive client g, at least one VSA from h(q) 
does not crash during the interval. 


3.1  VSA-to-VSA communication service (VtoVComm) specification 


The first service is an inter-VSA routing service, where a VSA from some region u can send a message m 
through VtoVsend(v,m),, to a VSA in another (potentially non-neighboring) region v. Region v’s VSA later 
receives m through VtoVrcv(m),. The service guarantees two properties: 

(1) If a VSA at region u performs a VtoVsend(v,m), and both region u and v VSAs are alive over the 
time interval beginning with the send and ending ttlyiov time later, then the VSA at region v performs a 
VtoVrev(m) before the end of the interval, and 

(2) If a message is received at some VSA, it was previously sent to that VSA. 


3.2 Location service specification 


A location service answers queries from clients for the locations of other clients. A client node p can submit 
a query for a recent region of client node g via a HLquery(q), action. If few home location failures occur and 
q has been in the system for a sufficient amount of time, the service responds within bounded time with a 
recent region location of g, greg, through a HLreply(q, greg); action. 


To be more exact, the location service guarantees that if a client p performs a HLquery to find an alive 
client q that has been in the system longer than €sample + d+ ttlviov +e + ttlazis time, and client p does 
not crash or move to a different region for ttly;gs time, then: 

(1) Within ttlyrs time, client p will perform a HLreply with a region for g, and 
(2) If p performs a HLreply(q, greg), then p had requested q’s location and q was either: (a) alive in region 
qreg within the last ttl, time, or (b) failed for at most ttl, + ttlaits — €sample time. 


3.3 Client end-to-end routing (EtoEComm) specification 

End-to-end routing is an important application for ad-hoc networks. The V-bcast service provides a local 
broadcast service where VSAs and clients can communicate with VSAs and clients in neighboring regions. 
VtoVComm allows arbitrary VSAs to communicate. End-to-end routing (EtoEComm) allows arbitrary 


clients to communicate: a client p sends message m to client q using send(q,m),, which is received by g in 
bounded time via receive(m),. 


If clients p and q do not crash for ttlyrtg time, clients do not change regions for ttly;s time after a send, 
and q has been in the system at least ttlyng + €sample + d+ ttlviov +e time, then: 
(1) If client p sends message m to q, q will receive m within ttlya;s + 2d + 2e +4 ttlytov time, and 
(2) Any message received by a client was previously sent to the client. 


3.4  Self-stabilizing implementations 


We require implementations of the above services to be self-stabilizing. A system configuration is safe with 
respect to a specification and implementation if any admissible execution fragment of the implementation 
starting from the configuration is an admissible execution fragment of the specification. An implemen- 
tation is self-stabilizing if starting from any configuration, an admissible execution of the implementation 
eventually reaches a safe configuration. Notice that in the presence of corruptions, if an implementation is 
self-stabilizing, then any long enough execution fragment of the implementation will eventually have a suffix 
that looks like the suffix of some correct execution of the specification, until a corruption occurs. 


Each of the above services’ self-stabilizing implementations will be built on top of self-stabilizing im- 
plementations of other services: VtoVComm over the VSA layer, the location service over the VSA layer 
and VtoVComm service, and EtokComm over the VSA layer, VtoVComm, and location services. Each self- 
stabilizing implementation uses lower level services without feedback, so lower level service executions are not 
influenced by the upper level services. This allows us to guarantee that higher level service implementations 
are still self-stabilizing through fair composition [11]. 

Our service implementations, starting from an arbitrary system configuration, stabilize within the fol- 
lowing times: VtoVComm: ttly:ov + d time after the VSA layer stabilizes (tygacor time), the loca- 
tion service: max(ttlais,2e + 3ttlyroy + ttlhy + 2d) time after VtoVComm stabilizes, and EtoKComm: 
ttlpp + 2d + 2e + ttlyiov time after the location service has stabilized. 


4 VSA to VSA communication (VtoVComm) implementation 


The VtoVComm service allows communication of ----...-.......-..------.--------------------------------- . 
messages between any two VSAs through VtoVsend 
and VtoVrcv actions, as long as there is a path of 
non-failed VSAs between them. The VtoVComm 
service is built on top of the V-bcast service [8], 
which supports communication between two neigh- | 
boring VSAs (see Figure 3). 

VSA-to-VSA communication is based on a | 
greedy DFS procedure. When a VSA receives 
a message for which it is not the destination, it 
chooses a neighboring VSA that is on a shortest | 
path to the destination VSA and forwards the mes- | oe, i | Yeoveena( un 
sage in a forward message to that neighbor. If the | ~~ pebothily we 
VSA does not receive an indication through a found |» +=“ = — 
message that the message has been delivered to the 
destination within some bounded amount of time, it es ! 
then forwards the message to the neighboring VSA «ee ! 
on the next shortest path to the destination VSA, 
and so on. This choice of neighbors is greedy in the 
sense that the next neighbor chosen to receive the 
forwarded message is the one on a shortest path to 
the destination VSA, excluding the neighbors as- 
sociated with previous tries. The greedy DFS can 
turn into a flood in pathological situations in which 
the destination is that last VSA reached. 
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Figure 3: VSA-to-VSA communication (VtoVComm). 
A VSA at region u sends a message m to region v’s 
VSA with a VtoVsend(v,m),. The message is eventu- 
ally received at region v by VtoVrcv(m),y. 
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Signature: 
Input VtoVsend(d,m)u,d € U,m arbitrary 
Input brev(m)u, m € ({forward}x Msgx Ux {u}) 
U ({found} x Msg) 


Output bcast(m).,m arbitrary 

Output VtoVrcev(m)u,m arbitrary 
Internal DFStimeout(msg)u,msg € Msg 
Internal DFSclean(msg)u, msg € Msg 


Msg = Mx Ux Ux R, of the form (m, v2vs, v2vd, ts) 


State: 
analog now € R, the current real time 
bcastq, VtoVrcvg, queues of messages, initially 0 
DFStable, a table indexed on message tuples in 
Msg with entries in (nbrs(u) x 27°rs(™ x R), 
of the form (isrc, NbrSet, nbrTO), initially @ 
curNobr € U, initially L 


Trajectories: 
satisfies 
d(now) = 1 
constant bcastq, VtoVrcvg, DF Stable, curNbr 
stops when 
Any precondition is satisfied. 


Actions: 
Output bcast(m)u 
Precondition: 


Internal DFStimeout(msg)u 
Precondition: 
DFStable(msg).nbrTO < now 


V DFStable(msg).nbrTO > now + 6(u, msg.v2ud) 


Effect: 
if DF Stable(msg).NbrSet 4 0 then 
curNbr — NxtNbr(DFStable(msg).NbrSet, 


beastq — bcastg U { (forward, msg, u, curNbr)} 
DFStable(msg).nbrTO — now +6(u, msg.v2vd) 
else DFStable(msg) — null 


Input brev((forward, msg, isrc, u))u 
Effect: 
if msg.ts € [now -ttlytov, now] then 
if u = msg.v2vd then 
beastq — bcastg U {(found, msg) } 
VtoVrevg — VtoVrevg U {msg.m} 
else if DF Stable(msg) = null then 
DFStable(msg) — (isrc, nbrs(u)\{isrc}, now) 


Input brev((found, msg) )u 
Effect: 
if DF Stable(msg) # null then 
DFStable(msg) — null 
if uA msg.v2us then 
beastq — bcastg U {(found, msg) } 


DFStable(msg).isrc, u, msg.v2vd) 
DFStable(msg).NbrSet — DF Stable(msg).NbrSet \{ curNbr} 


m € bcastq 
Effect: 
bcastg — bcastg \ {m} 


Output VtoVrcv(m)u 
Precondition: 

m € VtoVrcvg 
Effect: 


Input VtoVsend(d,m)u 
VtoVrevg — VtoVRevg \ {m} 


Effect: 
if u= dthen 
Vito Vrevg — VtoVrevg U {m} 
else DFStable((m, u, d,now)) — (u, nbrs(u), now) 


Internal DFSclean(msg)u 
Precondition: 

DFStable(msg) A null A msg.ts ¢ [now -ttlytoy, now] 
Effect: 

DFStable(msg) — null 


Figure 4: Greedy DFS algorithm at V,”*°Y for region wu. 


Self-stabilization of the algorithm is ensured by the use of a real-time timestamp to identify the version 
of the DFS. Too old versions are eliminated from the system and new versions are handled as completely 
new attempts to complete a greedy DFS towards the destination. 


We first present a simple greedy DFS algorithm that gradually expands the search until all paths are 
checked. This algorithm will find a path to the destination if such a path exists throughout the DFS 
execution. We also present a modification of the algorithm to produce a persistent version of the greedy 
DFS algorithm in which each VSA repeatedly tries to forward messages along previously unsuccessful VSA 
paths to take advantage of (possibly temporary) recoveries of VSAs that may result in a viable path [13]. 
Again, the persistent greedy DFS can turn into a persistent flood in pathological situations in which the 
destination is the last VSA reached. 


4.1 Detailed code description 


The following code description refers to the code for VSA V,’'°"Y in Figure 4. The main state variable 
DF Stable keeps track of information for messages that are still waiting to be delivered. For each such 
unique message, the table stores the intermediate source isrc of the message, the set of VSA neighbors 
NorSet of neighbors that have yet to have the message forwarded to them, and a timeout nbrTO for the 
neighbor currently being tried for forwarding the message. 


A source VSA V,”*°Y sends a message m to a destination VSA in region d using VtoVsend(d,m),, (line 
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Internal DFStimeout(msg)u 
Precondition: 
DFStable(msg).nbrTO < now V DFStable(msg).nbrTO > now + 6(u, msg.v2vd) 
Effect: 
if DF Stable(msg).NbrSet £ 0 then 
curNbr — NxtNbr(DFStable(msg).NbrSet, DFStable(msg).isrc, u, msg.v2vd) 
DFStable(msg).NbrSet — DFStable(msg).NbrSet \ {curNbr} 
for each n € nbr(u) \ DFStable(msq). NbrSet 
beastq — bcastg U {(forward, msg, u, n)} 
DFStable(msg).nbrTO — now +6(u, msg.v2vd) 
else DF Stable(msg) — null 


Figure 5: The Persistent Greedy DFS algorithm at V,”'°" for region u is the same as the Greedy DFS 
algorithm, except that the broadcast of a DFS message to cur Nor in the DFStimeout action is replaced 
with a broadcast to curNobr and all previously attempted neighbors. 


33). If u=d then VY" received m through VtoVrcev(m),, (lines 35-36). Otherwise the destination VSA is 
another VSA and V,”*°" sets the DF Stable mapping of an augmented version of the message, (m, u, d, now), 
to (u, nbrs(u), now). This enables the start of a new DFS execution to forward the message to its destination 
(line 37). 

Whenever the nbrTO of a message in DF'Stable times out, it triggers the forwarding of the message to the 
next neighbor in the DFS, if possible. If the message hasn’t yet been forwarded to all of the relevant neighbors 
(DF Stable(msg).NbrSet is not empty), then the next neighbor closest to the destination VSA that has not 
yet had a message forwarded to it, curNbr, is selected and the message tuple msg is then forwarded in a 
forward message to it using the V-bcast service (lines 45-48). The timeout variable DF'Stable(msg).nbrTO 
for this attempt at forwarding is set to now + 6(curNbr, msg.v2vd) (line 49). If the message has already 
been forwarded to all the relevant neighbors, then DF'Stable(msg) is set to null, indicating that nothing 
more can be done. 


If a message tuple msg whose destination is V,’'°Y is received in a forward message from isrc, then 
VSA V,’Y broadcasts a (found, msg) message via the V-bcast service and VtoVrcv’s the message msg.m. 
The found message notifies neighbors still participating in the DFS for msg that it has reached its final 
destination VSA. No forwarding is required (lines 55-57). Otherwise, if msg is not destined for V,Y’°” and 
V,VVY does not already have an entry in DF Stable for msg, then the message must be forwarded to its 
destination. DF'Stable(msg) is set to (ésrc, nbrs(u)\{isrc}, now) (line 59), storing the intermediate source, 
initializing the set of neighbors that have yet to have the message forwarded to them, and setting nbrT‘O to 
now. Setting nbrTO to now immediately enables the DFStimeout action for msg, triggering the forwarding 
of msg to one of V,Y!°V’’s neighbors. 


When a found message is received for a message tuple msg that is mapped by DF Stable, the entry in 
DF Stable is erased, preventing additional forwarding (line 64). If u A msg.v2vus then VSA V,"Y°Y broadcasts 
a found message via the V-bcast service (lines 65-66), notifying neighbors that are still participating for msg 
that it has been delivered. Clearly, if u = msg.v2vus, then no found message is required and no further action 
needs to be taken. 


4.2 Correctness 

We now prove the correctness of the algorithm. Let the source VSA be V,""°", the destination VSA be 
Ve’, the message sent be m, and a DFS execution exe from V,""°" to Vj/'°Y be as defined above. We 
assume a given function 6 : {U} x {U} — N, where 6(zx,y) is a bound on the time required for a message 
to arrive from x to y. This bound is based both on the distance between x and y, and the quality of the 
communication links in the network. Since the DFS and the 6 function are just employed to cut down on 
unneeded retransmission of messages, any non-negative wait time is sufficient for correctness. However, a 
wait time dependent on hop count between regions will be the most message-efficient. We argue that if no 
corruption failures occur and the status (failed or non-failed) of every VSA in U doesn’t change during eze, 
then the following holds: 

Lemma 4.1 Jf V,""°" is a non-failed VSA that performs a VtoVsend(d,m) at time t, and there exists a 
path of non-failed VSAs between V,Y'°Y and Very from time t to time t+ ttlytov, then View performs a 


VtoVrev(m) in the interval [t,t + ttlytov], for ttlviov > [e+ dt (matyrveud(u, v) + Matyeu|nbrs(u)| — 1)]- 
({U| — 1). 


Proof sketch: The proof is by induction on the distance n between s and V}’"°” on the shortest non-deserted 
path, where the distance is the number of VSAs along the path, including Ve In the case n = 0, the 
message m is destined for the same VSA. According to line 35, the message is VtoVrcv’ed at the VSA. 


Let’s assume that the lemma holds for every n’ <n. 


Let n be the VSA-distance between V,“°°" and V/'?". There exists a path of non-failed VSAs between 
VV and Vj)". Therefore, there exists a VSA V,’', which is a neighbor of V,”°°", such that there 
exists a path of non-failed VSAs between V,Y'°Y and Vj". The distance between V,”°°Y and Vj" is 
n—1, hence the induction assumption holds for V,’"°Y and V}’'°”. Therefore, a message sent from V,”'°Y 
to V}’'" eventually reaches V}’*°Y. The same assumption holds for V,“'°" and V,”'°", therefore, V’°°Y 
receives the message m sent from region s. | 


Lemma 4.2 The number of times that a message tuple msg is re-broadcast is bounded. 


Proof sketch: The broadcast of a message tuple stops in either of the following cases: 


e A found message was received for msg. According to line 62, if the value of DF'Stable(msg) was not 
already null, it gets set to null, preventing V,”°” from doing anything with subsequent found messages. 
If V,Y VY was not the original source of msg, it retransmits found for msg exactly one time. If a found 
for msg is received again, it will be ignored. A forward message for msg would need to be received 
again in order to result in any additional found mesages for msg at this VSA. This, however, cannot 
happen since each VSA participating in the DFS waits before triggering new forward messages until 
found messages would have been returned. 


e For each VSA neighbor, if VSA V,”"°Y does not receive a found message for msg it will time out via 
nbrTO. Once the set of neighbors to be queried is exhausted, the VSA erases the entry for msg in 
DF Stable, preventing any additional forwarding by itself. 


Lemma 4.3 Once corruptions stop and the VSA layer has stabilized, it takes up to d+ ttlytoy time for 
VtoVComm to stabilize. 


Proof sketch: Any message in the system that is being forwarded by VtoVComm will be cleaned out of 
the system if they are older than ttly:.v or newer than the current time. As a result, the longest a “bad” 
message can be in the system is this time, plus up to an additional d time where it could have been in 
transmission before being received by a VSA. a 


5 Home Location Service (HLS) implementation 


The location service, as described in the last section, allows a client to determine a recent region of another 
alive client. In our implementation, called the Home Location Service (HLS), we accomplish this using home 
locations. Recall that the home locations of a client node p are f + 1 regions whose VSAs are occasionally 
updated with p’s region. The home locations are calculated with a hash function h, mapping a client’s id to 
a list of VSA regions, and is known to all VSAs. These home location VSAs can then be queried by other 
VSAs to determine a recent region of p. 

Figure 6 depicts how the VSA abstraction and VtoVComm are used in HLS. The HLS implementation 
consists of two parts: a client-side portion and a VSA-side portion. cy © is a subautomaton of client p 
that interacts with VSAs to provide HLS. It is responsible for notifying VSAs in its current and neighboring 
regions which region it is in. Also, OF ” handles each request submitted by input HLq uery(q), for q’s region, 
by broadcasting the query via V-beast to VSAs V,”¥ in its current and neighboring regions. It translates 
responses from the VSAs into HLreply outputs. 
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Figure 6: Home Location Service. A client p can query local VSAs for client q’s region. The VSAs then 
query home locations of g, using VtoVComm, for a recent region of g, and return it to p. 


For the VSA-side, V7" and V#¥ in Figure 6 are home location VSAs corresponding to regions u and v 
of the network; they are subautomata of VSAs V,, and V,. V,“” takes a request from a local client for client 
node q’s region, calculates q’s home locations using the hash function, and then sends location queries to the 
home locations using VtoVComm. Those virtual automata respond with the region information they have 
for q, which is then provided by V,“” to the requesting client. V,“” also is responsible both for informing 
the home locations of each client p located in its region or neighboring regions of p’s region, and maintaining 
and answering queries for the regions of clients for which it is a home location. 


Time and region information from the GPS oracle is used throughout the HLS algorithm, by clients and 
VSAs, to timestamp and label information and messages. This information is used to guarantee timeliness 
of replies from the HLS service, and to stabilize the service after faults. Timestamps are used to determine 
if information is too old or too new, while region information allows clients and VSAs to know which other 
clients and VSAs to interact with. 


5.1 HLS client actions 
The code executed by client p’s cr “ is in Figure 7. 


Clients receive GPSupdates every €sample time from the GPS automaton (lines 28-33), making them aware 
of their current region and the time. Ifa client’s region has changed, the client immediately sends a heartbeat 
message with its id, current time and region information. The client periodically reminds its current and 
neighboring region VSAs of its region by broadcasting additional heartbeat messages every ttl, time, where 
ttly» is a known constant (lines 35-39). 

CH” also handles the HLquery(q) inputs it receives (line 41). This request for q’s location is stored in 
a queryg table and, once the client knows its own region, translated into a (clocQuery, q) message that is 
broadcast, together with the VSA region, to local regions’ VSAs (lines 45-49). If ce eventually receives a 
(clocReply, g, greg) message from its current or neighboring region’s VSA for a client g in querygq, indicating 
that node q was in region greg (lines 51-55), it clears the entry for g in queryq, and outputs a HLreply(q, greg) 
of the information (lines 57-61). If the request for q’s location goes unanswered for more than ttlyrs —€sample 
time, then the request has failed and is removed (lines 63-67). 


5.2 HLS VSA actions 
The code for automaton V,#” appears in Figure 8. 


First, the VSA knows which clients are in its or neighboring regions through heartbeat messages. If a 
VSA hears a heartbeat message from a client p claiming to be in its region or a neighboring region, the 
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Constants: 
ttlny 
ttlais 


Signature: 
Input GPSupdate(v, t)p), ve U,tER 
Input HLquery(qg)p, gE P 
Input brev((m, u))p, m € ({clocReply} x Px Ux U),we U 
Output bcast((m,reg))p,m€ (heartbeat, now,p)U{clocQuery} x P 
Output HLreply(¢,v)p, gE P, ve U 
Internal queryfail(q)p, ¢ € P 


State: 
analog now € R, current real time, initially L 
hbTO < now + ttlpay, € R, the next heartbeat time 
reg € U, the current region, initially L 
queryq, a table from P to R, initially 0 
queryrcv, a queue of P x U pairs, initially 0 


Trajectories: 
satisfies 
d(now) = 1 
constant hbTO, reg, queryq, queryrcu 
stops when 
Any precondition is satisfied. 


Actions: 


Output bcast((heartbeat, now, p), reg)p 
Precondition: 

hbTO < now Areg 4 L 
Effect: 

hbTO — now + ttl, 


Input HLquery(q)p 
Effect: 


queryq(q) — co 


Output bcast(((clocQuery, q), reg))p 
Precondition: 

reg # LA queryq(q) > now + tllaLs -€sample 
Effect: 

queryq(q) — now + tllais -E€sample 


Input brev(((clocReply, g, greg), u))p 
Effect: 
if (ue nbrs(reg)U {reg}A queryq(qg)Anull) then 
queryrcv — queryrcu U {(q, greg) } 
queryq(q) — null 


Output HLreply(q, greg) p 
Precondition: 


(q, greg) € queryrcu 
Effect: 


queryrcv — queryrcv \ {(q, greg) } 


Input GPSupdate(v, t)p 
Effect: Internal queryfail(q)p 
now — t Precondition: 
if reg 4 v then queryq(q) < now 
reg <— v Effect: 
hbTO — now queryq(q) — null 


Figure 7: HLS’s Cr © automaton. This client subautomaton serves as a bridge between the client’s 
requests and the VSA layer. 


VSA sends a locUpdate message for p, with p’s heartbeat timestamp and region, through VtoVComm to the 
VSAs at home locations of client p (lines 42-46), where home locations are computed using the known hash 
function h from P x {1,---,f+1} to U. 


When a VSA receives one of these locUpdate messages for a client p, it stores both the region indicated 
in the message as p’s current region and the attached heartbeat timestamp in its loc table (lines 48-51). 
This location information for p is refreshed each time the VSA receives a locUpdate for client p with a newer 
heartbeat timestamp. Since a client sends a heartbeat message every ttl, time, which can take up to d+e 
time to arrive at and trigger a VSA to send a locUpdate message through VtoVComm, which can take 
ttlvzoy time to be delivered at a home location, an entry for client p is erased if its timestamp is older than 
ttlh» + d+ e+ ttlytov (lines 53-57). 


The other responsibility of the VSA is to receive and respond to local client requests for location infor- 
mation on other clients. A client p in a VSA’s region or a neighboring region v can send a query for q’s 
current location to the VSA. This is done via a mobile node’s broadcast of a ((clocQuery, g),v) message. 
When the VSA at region wu receives this query, if no outstanding query for q exists, it notes the request for q 
in lquery(q), and sends a vlocQuery message to q’s f + 1 home locations, querying about q’s location (lines 
59-65). Any home location that receives such a message and has an entry for q’s region responds with a 
vlocReply to the querying VSA with the region (lines 67-70). 

If the querying VSA at u receives a vlocReply in response to an outstanding location request for a client 
q, it stores the attached region information in lquery(q) (lines 72-75), broadcasts a clocReply message with 
q and its region to local clients, and erases the entry for Iguery(q) (lines 77-81). If, however, 2ttlyioy + 2e 
time passes since a request for q’s region was received by a local client and there is no entry for q’s region, 
Iquery(q) is just erased (lines 83-87). 
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Constants: 
ttlvtov 
ttl» 
h, a hash function from P x {1,---,f +1} toU 
such that for p € P, z,y € {1,---,f +1}, 
if x Ay, then h(p,x) 4 h(p,y) 


Signature: 
Input brev((m, v))u, m € ({heartbeat} x R x P) 
U ({clocQuery} x P), vE U 


Input VtoVrev((v,m))u,v € U,m € ({locUpdate} x Px 


R)U ({vlocQuery} P)U ({vlocReply} x Px U) 


Output bcast(((clocReply, g, greg), u))u,qe P, qrege U 


Output VtoVsend(v, m)u,v € U 
Internal updateHL(q)u,q € P 
Internal cleanLoc(q)u,q € P 
Internal cleanLquery(q)u,q € P 


State: 

loc, a table indexed on process ids with entries 
from U x R2°, of the form (reg, ts) 

Iquery, a table indexed on process ids with entries 
from R2° x U, of the form (to, greg) 

utoug, a queue of tuples from U x msg 

(Above all initially empty) 

analog now € R2°, the current real time 


Trajectories: 
satisfies 
d(now) = 1 
constant loc, Iquery, vtovg 
stops when 
Any precondition is satisfied. 


Actions: 
Output VtoVsend(v, m)u 
Precondition: 
(uv, m) € vtovg 
Effect: 
utovg — vtoug \ {(v, m)} 


Input brev(((heartbeat, t, p), v))u 
Effect: 
if (v € nbrs(u)U {u}A now -d < t< now) then 
fori=1tof4l 


utoug — vtovg U {(h(q, 2), (v, (locUpdate, q, t)))} 


Input VtoVrev((v, (locUpdate, g,t)))u 
Effect: 
if loc(q).ts < t < now then 
loc(q) — (wt) 


Internal cleanLoc(q)u 
Precondition: 

loc(q).ts ¢ [now -ttln, -d -e -ttlyroy, now] 
Effect: 

loc(g) — null 


Input brev(((clocQuery, qg), v))u 
Effect: 
if ((Iquery(q) = null V lquery(q).to < now] 
Ave€ nbrs(u)U {u}) then 
Iquery(q) — (now + 2ttlytov + 2e, L) 
for i=1to f+l 
utoug — vtoug U {(h(q,i), (u, (vlocQuery, g)))} 


Input VtoVrev((v, (vlocQuery, g)))u 
Effect: 
if loc(q) # null then 


utoug — vtovg U {(v, (u, (vlocReply, g, loc(q).reg))) } 


Input VtoVrcev((v, (vlocReply, g, greg)))u 
Effect: 
if Iquery(q) # null then 
Iquery(q).qreg — greg 


Output bcast(((clocReply, g, Iqguery(q).greg), u))u 
Precondition: 
Iquery(q).qreg A L 
Effect: 
Iquery(q) — null 


Internal cleanLquery(q)xu 
Precondition: 
Iquery(q).to ¢ [now, now + 2ttlytoy + 2e] 
Effect: 
Iquery(q) — null 


Figure 8: HLS’s V,“” automaton. 
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5.3 Correctness 
We make the system assumptions described in Section 3. Call Cg the first global configuration where the 
system is consistent. For the following two lemmas and theorem, assume we are in a configuration after Ce, 


and that no corruption failures occur. 
Lemma 5.1 For any VSA u, if there is a request for q’s region in lquery, it was submitted through a 


HLquery(g) at a client within the last €sampte + d+ 2ttlvtov + 2e time. 

Proof sketch: Once a request is submitted by a client to C/’”, if the client has not ever received a GPSupdate, 
it can take up to €sample time for the client to receive one. After the client has received one, it then broadcasts 
the request to local VSAs, which takes up to d time to be delivered. VSAs then hold these queries until they 
expire 2ttlyioy + 2e later. | 


Lemma 5.2 Starting €sample + d+ e+ ttlvtov time after client p enters the system and until p fails, for 
each interval of length ttlyrov +e, all but f of p’s home locations will have a non-null loc(p) entry for the 
entire interval. If client p is alive and there is some VSA wu such that loc(p) is not null, p was alive and 
located in loc(p).reg within the last €sample +d+ e+ ttlytov time. 
Proof sketch: Within €sample time of a client entering the system, a GPSupdate occurs and the client trans- 
mits a heartbeat message. This message can take up to d time to be received by a nearby VSA, after which 
it can take e + ttlyiov time for the VSA to transmit the associated locUpdate message to the client’s home 
locations and have the message be received, updating any alive home locations’ loc(p) entries. Since for any 
interval of length ttlpy + d+ 2e+4+ ttlytoy, at most f of the client’s home locations can be failed at any point 
in the interval, all but f of the client’s home locations will receive a locUpdate message and have a non-null 
loc(p) entry, and will remain alive with a non-null loc(p) entry for at least ttlyroy +e after the next locUpdate 
message is received (within ttl,, + d+ e+ ttlytov time after the first was sent). Since this is true for each 
locUpdate message, there can only be f home locations that either do not have a non-null loc(p) entry or 
that will not be alive for an additional ttly:o.yv + e time. 

For the second statement, note that an alive client p will send a heartbeat message within €sample time of 
arriving in a region, prompting updates to loc(p) at alive home locations within d+ e+ ttlytoy time. Hence, 
if a client is alive, any non-null entry for loc(p).reg can only be as old as €sampie + d+ € + ttlvtov- | 


Theorem 5.3 Every client p searching for a non-failed client q that has been in the system longer than 
ttlaons + €sample + d+ ttlviov + € time will perform a HLreply(q, greg) within time ttlyis, such that q was 
located in region qreg no more than ttlyzs time ago. No reply will occur if q has been failed for more than 
ttlny + ttlyts — €sample time. Any reply is in response to a query. 

Proof sketch: For the first statement, by the previous lemma, we know that once client q has been in the 
system for €sample + d+ e+ ttlviovy time, any queries of its home locations will succeed in producing a 
result. However, a new HLquery request “piggybacks” on any prior unexpired HLquery requests. Since one 
of these requests could have been initiated just before the client q’s home locations are updated, we can only 
guarantee a response will be received for a new request if any outstanding requests will be answered. If the 
client has been in the system for this total ttla,s +d+e+ttlytoy time after receiving its first GPSupdate, 
then any response to a query can take as much as ttlyp9 time: €sample time for the querying client to receive 
its first GPSupdate, d time for the query to be transmitted and received by a local VSA, e+ ttlytov for the 
local VSA to query a home location, e+ ttly:oy for the response to arrive at a local VSA, e time for the local 
VSA to transmit the response to its requesting clients, and d time for the transmission to be received and 
translated into HLreplys at clients. This total is ttly;g. As for the age of the response, by the prior lemma, 
we know that information can only be out of date by €sample + ttlvtov +e +d time when a home location 
responds to a query by another VSA. The response can take e + ttlyioy time to arrive at the querying VSA, 
followed by e + d time for the querying VSA to get the information to the clients that prompted the query. 
The oldest the information could be is the total. 


For the second statement, note that a failed client will not send a heartbeat message. Since loc(p) entries 
are cleared once ttlpp, +d+e+ttlytov time has passed since the heartbeat message upon which it was based 
was broadcast, and the information from the entry can only take as much as e+ ttlyioy time to reach a 
querying VSA and e +d time to reach any querying clients, the total is the maximum time a HLreply can 
occur after the client fails. 


For the third statement, note that a query expires after ttlyrzs time. Hence, any response generated 
must be for a query that occurred no more than that time before. a 
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Theorem 5.4 Starting from an arbitrary configuration, after VtoVComm has stabilized, it takes max(ttla rts, 2e+ 
3ttlviov + ttlpy + 2d) time for HLS to stabilize. 


Proof sketch: Once lower levels have stabilized, most client state is made locally consistent within €sampie 
time, the time for the client to get a GPSupdate. This action resets most variables if the region is updated. 
The remaining portions of client state are made consistent instantaneously with local correction actions, with 
the exception of the heartbeat timer and querygq variables. The heartbeat timer can only affect operations 
for at most ttl,zy time. The queryq variable can only affect operations for ttlyzs5 time, when it would be 
deleted. 


For VSAs, there are two variables that are not instantaneously corrected: loc and Iquery. 


The loc variable will be consistent within time e+2ttlyioy +ttlpyt+d. At worst, there could be a corrupted 
message that arrives at a VSA after ttlyiov time, adding a bad entry that takes e+ ttlyroy +ttlyy +d time to 
expire. If the client referred to is in the system, it might not be until the next update after the timestamp of 
the corrupted message (which could have been delivered as late as ttly;.y after corruptions stopped) arrives 
for the information to be cleaned up. This time is exactly what the offset term for loc timeouts describes. 
Hence, the variable might not be cleaned until ttlyzov plus that offset term. 


However, there may be responses based on this bad loc table information that were sent right at e + 
2ttlviov + ttly, + d, and that take e + ttly;.y to arrive at the VSA. The resulting transmission (taking d 
time to complete) to local clients is then incorrect. However, those incorrect transmissions cease after the 
total time 2e + 3ttlyroy + ttlyy + 2d elapses. 


The [query variable is cleaned up within ttlyrgs time. An entry in [query only has a total of 2ttlytoy + 2e 
time in the data structure. It could be the case that a spurious request was transmitted in the beginning, 
which adds d time. If a region response is received it results in immediate correction of the state through 
erasure. Hence, the time required to be consistent is the time that it takes for a query to be accounted for. 


The maximum of ttly;s and 2e + 3ttlytoy + ttlay + 2d is the maximum stabilization time. | 


5.4 Extensions 
Here we briefly describe some possible extensions to our HLS algorithm: 


Home location voting mechanisms: In systems where corruption failures are limited in number at the 
VSA level, our implementation could be extended to use a voting mechanism, allowing the “weed-out” of 
information from corrupted home locations. Rather than querying VSAs waiting for a single region response 
from a home location VSA, they could wait until the same region is returned from a majority of home 
locations VSAs. If corruption is limited to some small number of VSAs at a time, but can happen often, 
then this voting mechanism can be used to provide a stronger location service, immune to these limited 
number of faults. 


Randomized asymmetric quorums: It is possible to have asymmetric updates and queries, such as with 
local updates to close-by VSAs and uniformly selected VSAs or vice versa (the expected number of VSAs 
that are required to be updated and queried is small, as proved in [22]). Instead of using a predefined set 
to query, one might use a randomized scheme based on [22], where a random set of regions is chosen for 
updating and inquiring about the location of a client node. Moreover, we could enhance the scheme in [22] 
by using a predefined set for location updates (such as the close-by regions) and random set for location 
queries (or vice versa). 


Attribute queries: There are scenarios in which one would like to query for client nodes with certain 
attributes in a geographic area (e.g., a search for a medical doctor that is currently near by). Our scheme 
supports such queries in a natural way: Attributes can hash to home locations that store tables of clients 
with the attribute, and their locations. Clients searching for another nearby client with some attribute could 
then have a local VSA query home locations for the attribute, and select a nearby client from the list that 
is returned. 


6 Client end-to-end routing (EtoEComm) implementation 


Our implementation of the end-to-end routing service, EtoKComm, uses the location service to discover a 
recent region location of a destination client node and then uses this location in conjunction with VtoVComm 
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Figure 9: End-to-end routing. A client oa can send a message to another client Ce by querying HLS 
for q’s region, and then having local VSAs forward the message to q’s local regions through VtoVComm. 
The message is received by those VSAs and broadcast for delivery by C ; 


to deliver messages (see Figure 9). As in the implementation of the Home Location Service, there are two 
parts to the end-to-end routing implementation: the client-side portion and the VSA-side portion. Also as in 
HLS, time and region information from the GPS oracle is used throughout this implementation to timestamp 
and label information. 


The client-side portion Cc takes a request to send a message to another client qg, queries the HLS 
for q’s location, and submits the message to have it sent by a VSA in its current or neighboring regions 
to q’s location. It also takes messages originating at other clients and transmitted to it by its current or 
neighboring regions’ VSAs, and delivers them. 


The VSA V,"2" portion is very simple. A client may send it information to be transmitted to other 
VSAs, which it forwards through VtoVComm, or another VSA may send it information to be delivered at a 
client in its own or a neighboring region, which it forwards through V-bcast. 


6.1 EtoEComm client actions 


The signature, state, and actions of Ge are in Figure 10. The main variable phbook is a table, indexed 
on destination pid, with entries of the form (reg, ttl,msg). For a client q, phbook(q).reg stores the current 
region of g (unless it is unknown, in which case it is L). The field ttl stores a timeout for phbook(q).reg if 
the region of g is known and stores a timeout for querying for the region if not. The set msg stores messages 
being sent to q. 

The GPSupdate(v,¢) action (line 36) results in an update of the client’s reg variable to the region v 
indicated in the action and a reset of the local clock. 


A message m is sent to another client q via send(g,m),. This input to C¥?” results in the forwarding 


of the message to p’s current region u’s VSA through bcast(((sdata, m, q, phbook(q).reg),p, u)) if a region 
phbook(q).reg for q is known (line 44-45), or the saving of the message in phbook(q).msg, if the client does 
not have the location of gq (lines 46-48). 

If a recent region for g is not known, Cee attempts to discover one. It queries HLS to determine where q 
was through the HLquery(q), action (line 50). A timeout for response to the location request, phbook(q).ttl, 
is set for ttlyrtgs later. If the timeout expires but no messages are waiting to be sent, cleanPhbook(q) erases 
the entry, preventing unnecessary HLquerying (line 63). 

Once a response to an HLquery(q) is received from HLS in the form of HLreply(q, greg)» (line 57), indicating 
q was in region greg, entry phbook(q).reg is updated to greg and phbook(q).ttl is updated to now + ttlpp, 
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Constants: 


ttllats 
tls 


Signature: 
Input HLreply(q, v)p,q € Phu Ee U 
Input send(g,m)p,q € P 
Input GPSupdate(v,t)p,u € U,tE R 
Input brev(((rdata, m),p,u))p, u€ U 
Output bcast(m), 
Output HLquery(q)p,¢ € P 
Output receive(m), 
Internal cleanPhbook(q)p,q € P 


State: 
analog now € R, current real time, initially L 
reg € U, the current region, initially L 
phbook, a table indexed on process id with entries from 
U x R x 289, of the form (reg, ttl, msg), initially @ 
sdataq, deliverg, queues of messages, initially 0 


Trajectories: 
satisfies 
d(now) = 1 
constant reg, phbook, sdatagq, deliverq 
stops when 
Any precondition is satisfied. 


Actions: 
Output bcast(((sdata, m, g, greg), p, Te9))p 
Precondition: 
(m, q, greg) € sdatag A reg A L 
Effect: 
sdatag — sdatag \ {(m, q, greg)} 


Input GPSupdate(v, t)p 
Effect: 
now —t 
if reg 4 v then 
reg <— v 


Input send(q,m)p 
Effect: 
if (phbook(q).reg # LA phbook(q).ttl > now) then 
sdataq — sdatag U {(m, 9, phbook(q).reg) } 
else if (phbook(q)= nullV phbook(q).ttl< now) then 
phbook(q) — (L, L, {m}) 
else phbook(q).msg — phbook(q).msg U {m} 


Output HLquery(q)p 
Precondition: 
phbook(q) = (1, ttl, m 4 0) 
A (ttl = LV ttl > now + ttlrg) 
Effect: 
phbook(q).ttl — now + ttlars 


Input HLreply(q, greg) p 
Effect: 
for each m € phbook(q).msg 
sdatagq — sdatag U {(m, q, qreg)} 
phbook(q) — (qreg,now + ttlp», 9) 


Internal cleanPhbook(q)p 
Precondition: 

phbook(q)= (greg, ttl, msg) [(qreg = LA msg = 9) 

V (qreg A LA [ttl> now+ttly,V msg A O])V ttl< now] 
Effect: 

phbook(q) — null 


Input brev(((rdata, m), p, u))p 
Effect: 
if u € {reg} U nbrs(reg) 
deliverg — deliverg U {m} 


Output receive(m), 
Precondition: 

m € deliverq 
Effect: 

deliverg — deliverg \ {m} 


Figure 10: EtoEComm’s CP?" automaton. 


storing the location of g and setting a timeout for use of the location information. For each message waiting 
to be sent to g in queue phbook(q).msg, the message, with the location information for the destination, 
is forwarded to p’s current and neighboring regions’ VSAs through a bcast(((sdata, m,q, greg), p, u)) (lines 
59-60, 30-34). 

Messages for client p from other clients are received from p’s current region or a neighboring region v’s 
VSA through brev(((rdata, m),p,v)), (line 70). The message m is subsequently delivered through the output 
receive(m), (line 75). 


6.2 EtoEComm VSA actions 
The signature, state, and actions of V,"?” are in Figure 11. 


The receipt of a message m to be sent from a client p to q at greg through brcev(((sdata, m,q, greg), P, V)), 
v either u or a neighbor (line 33) results in the subsequent forwarding of the message to the virtual automata 
at regions in calcregs(qreg) and their neighboring regions, via the virtual automata communication action 
VtoVsend(qreg, (data, m,q))u (line 33-38). The set calcregs(qreg) contains the regions that q could occupy 
by the time the message is delivered to it (since we do not require the client to be stationary during execution 
of the algorithm). As will be seen shortly, the definition of calcregs is dependent on assumptions about client 
mobility. 


Likewise, the receipt, via VtoVrev((data, m,p)). (line 40), of message m intended for client p results in 
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Signature: Actions: 
Input VtoVrcev((data, m,p))u,p € P Output bcast(m) x 
Input brev(((sdata, ™m, q; qreg),P, v))us D,qe Ps qreg, ve U Precondition: 
Output bcast(m), m € bcastq 
Output VtoVsend(v, m)u,v € U Effect: 
beastg — bcastq \ {m} 
State: 
vtovg, a queue of tuples from U x msg, initially 0 Output VtoVsend(v, m)u 
bcastq, a queue of messages, initially 0 Precondition: 
(v,m) € vtovg 
Trajectories: Effect: 
satisfies utovg — vtoug \ {(greg,m)} 
constant vtovg, bcastq 
stops when Input brcev(((sdata, m, q, greg), p,V))u 
Any precondition is satisfied. Effect: 
if v € nbrs(u) U {u} then 
function calcregs(v: U): 24 = let gregions = calcregs(qreg) in 
return nbrs(v) U {v} for each v € qregions U nbrs(gregions) 
utovg — vtoug U { (greg, (data, m, q)) } 


Input VtoVrcv((data, m,p))u 
Effect: 
beastq — beast U {((rdata, m), p, u)} 


Figure 11: EtoEComm’s V,£?” automaton. 


the forwarding of the message to p via bcast(((rdata, m),p,u)), (line 42). 


6.3 Correctness 
We make the system assumptions described in Section 3. Correctness of the EtoKComm implementation 
is dependent on assumptions about client mobility and the definition of the function calcregs, used in the 
EtoEComm VSA algorithm. We can prove correctness under either of the following two conditions: 
(1) calcregs(qreg) returns the set containing greg and its neighbors, and each client remains in a region at 
least €sample + 3ttlytov + 5e + 4d + ttl,, time before moving to a neighboring region, or 
(2) calcregs(qreg) returns the set containing greg and each region v such that the supremum distance between 
any two points in v and greg is at most Umaz - (€sample + 3ttlvtov + 5e + 4d +t ttlpp). 

We then outline correctness for EtoEComm under these assumptions. For the first lemma and theorem, 


assume we start in a safe configuration and no corruption failures occur. 
Lemma 6.1 Consider an alive client q such that some other client p has a non-null, non-L entry for 


phbook(q).reg. If q does not fail for an additional 2d +2e+ ttlytov time, then at any point in that interval, 
q will be located in a region in calcregs(phbook(q).reg). 

Proof sketch: First, we note that a non-null, non- entry phbook(q).reg has information that is at most 
Esample + 2ttlvtov + 3e + 2d out-of-date (from HLS) when it is first installed, after which it is saved for an 
additional ttl,, time. 

If we are assuming condition 1, client g must be in the region indicated, or a neighboring region, and 
will remain in those regions for an additional 2d + 2e + ttlyioy time. If we are assuming condition 2, at 
any point up to 2d + 2e + ttlytov later, client q can be in any region reachable from qreg in the total 
Esample + 3ttlytov + 5e + 4d + ttlp, time, when traveling at speed Umaz- | 


Theorem 6.2 Consider a client p that performs asend(q,m), and does not change regions for ttlyrs time. 
If client q has been in the system for ttlyig + €sample + d+ ttlvtov + e time and does not fail, then q will 
perform a receive(m) within ttly rs +2d+ 2e+ttlytioy time. If a client receives a message, it must previously 
have been sent to it. 


Theorem 6.3 Starting from an arbitrary configuration, after HLS has stabilized, it takes ttlpy + 2d + 2e+ 
ttlvioy time for EtoEComm to stabilize. 

Proof sketch: Bad region information can be in phbook for up to ttl,p time, and messages sent using this 
information are not delivered and cleared until up to d+e+ttlyzoy +e+d later. At the same time, while HLS 
has been stabilizing, phbook’s message collection can take up to ttlyzs time to be cleared. The maximum 
of these quantities is the time for EtoKComm to stabilize. a 
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6.4 Extensions 
Here we briefly describe some possible extensions to our EtoEComm algorithm: 


Routing optimizations: Once the location of a client is known, communication with the client can be 
continued directly, and movements during the conversation may be piggy-backed on the information trans- 
ferred in order to update the destination according to the move (as suggested [12]). We also note that we 
can use an embedded tree location scheme such as the one in [12], implemented by virtual automata, where 
intermediate tree nodes are also mapped to regions. 


Sleeping client messaging service: Mobile clients might be able to shut down to conserve power. We 
could guarantee that a sleeping client eventually receives messages intended for it by having local VSAs save 
the messages. The VSAs then, at predefined times, broadcast the messages. Sleeping clients awake for these 
broadcasts, receive their messages, and can go to sleep again afterwards. 


7 Concluding remarks 


We described how both the GPS oracle and the VSA programming layer could help implement self-stabilizing 
geocast routing, location management, and end-to-end routing services. The self-stabilizing VSA layer 
provides a virtual fixed infrastructure useful for solving a variety of problems. It acts as a fault-tolerant, 
self-stabilizing building block for services, allowing applications to be built for mobile networks as though 
base stations existed for mobile clients to interact with. 


The GPS oracle’s frequently refreshed and reliable timing and location information made providing self- 
stabilization easier. We believe the paradigm of an external service providing reliable information that can 
be used in a self-stabilizing service implementation is an especially important and relevant one in mobile 
networks. Mobile networks demonstrate many properties that naturally require self-stabilizing implemen- 
tations, such as a need for self-configuration, or the possibility of unpredictable kinds of failures, but also 
often have access to reliable external knowledge that can act as a source of shared consistency in the net- 
work; here, accurate region knowledge allowed nodes to determine who they should be communicating with 
(current region and neighboring region nodes), and time information allowed them to order messages and 
assess timeliness of information. 
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